-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(file-based): new AbstractFileBasedStreamPermissionsReader #402
feat(file-based): new AbstractFileBasedStreamPermissionsReader #402
Conversation
…getting ACLs and Identities
📝 WalkthroughWalkthroughThis pull request introduces a new abstract class, Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FileBasedSource
participant PermissionsFileBasedStream
participant FileIdentitiesStream
participant PermissionsReader
Client->>FileBasedSource: Initialize(stream_permissions_reader)
FileBasedSource->>PermissionsFileBasedStream: _make_permissions_stream(stream_permissions_reader)
FileBasedSource->>FileIdentitiesStream: _make_identities_stream(stream_permissions_reader)
FileIdentitiesStream->>PermissionsReader: load_identity_groups()
PermissionsFileBasedStream->>PermissionsReader: get_file_acl_permissions(file)
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (10)
airbyte_cdk/sources/file_based/stream/identities_stream.py (3)
29-29
: Add docstring for new parameter?
Would you like to expand the docstring to clarify this parameter's role (e.g., describing what the caller should pass in)? wdyt?
35-35
: Consider renaming the attribute.
Would you consider prefixing it with an underscore (e.g.,_stream_permissions_reader
) if it's intended to be private? wdyt?
45-45
: Loading large identity groups?
If you expect a lot of identities, might you want partial fetching or pagination to avoid large in-memory sets? wdyt?airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (7)
5-8
: Imports are minimal and clear.
No issues here, though if you need advanced ABC features, you might considertyping_extensions
. wdyt?
9-9
: Use of RemoteFile
Is there a possibility you might also read from local files in future? If so, do you foresee expanding or renaming this class? wdyt?
12-15
: Class docstring looks helpful.
Would you consider adding clarifications on how logger should be provided and used in each method? wdyt?
17-34
: Consider typed return for ACL permissions.
Would you consider returning a dataclass or typed model instead of a generic Dict for future type-checking? wdyt?
36-58
: Potential pagination approach
If there could be many identities, might you want to handle partial fetching to avoid memory spikes? wdyt?
60-84
: Schema location
Would you store this schema in a separate JSON resource for easier updates and versioning, rather than in code? wdyt?
86-109
: Collaboration with file_permissions_schema
Might you share definitions across both permission and identity schemas (e.g., common fields) to reduce duplication? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
airbyte_cdk/sources/file_based/file_based_source.py
(5 hunks)airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
(1 hunks)airbyte_cdk/sources/file_based/file_based_stream_reader.py
(0 hunks)airbyte_cdk/sources/file_based/stream/identities_stream.py
(3 hunks)airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py
(4 hunks)unit_tests/sources/file_based/stream/test_file_identities_stream.py
(3 hunks)unit_tests/sources/file_based/stream/test_permissions_file_based_stream.py
(5 hunks)
💤 Files with no reviewable changes (1)
- airbyte_cdk/sources/file_based/file_based_stream_reader.py
🧰 Additional context used
🪛 GitHub Actions: Linters
airbyte_cdk/sources/file_based/file_based_source.py
[error] 106-106: Incompatible default for argument 'stream_permissions_reader' (default has type 'None', argument has type 'AbstractFileBasedStreamPermissionsReader'). PEP 484 prohibits implicit Optional.
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (24)
unit_tests/sources/file_based/stream/test_permissions_file_based_stream.py (5)
17-19
: Import looks good, aligned with the PR objective.The import of
AbstractFileBasedStreamPermissionsReader
properly sets up the test to work with the new permissions reader class.
59-59
: Good addition of the permissions reader mock.This adds the necessary mock for the new permissions reader, keeping the test aligned with the implementation changes.
67-67
: Setting schema on permissions reader is correct.Moving the schema setting from stream_reader to stream_permissions_reader aligns with the new responsibility separation.
79-79
: Proper constructor update.Adding the stream_permissions_reader to the constructor ensures the test creates the stream with the correct dependencies.
83-83
: Correctly updated test methods to use permissions reader.The test methods now properly call methods on the new permissions reader mock rather than the stream reader, maintaining test functionality while adapting to the new implementation.
Also applies to: 110-112, 125-125
airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py (5)
10-12
: Import of new permissions reader class looks good.The import is properly structured and aligns with the PR objective of introducing a dedicated class for permissions handling.
32-37
: Clean implementation of the updated constructor.The constructor now properly accepts and stores the permissions reader, maintaining backward compatibility with the **kwargs pattern while adding the new required parameter.
41-41
: Good refactoring of schema retrieval.Switched from using stream_reader to stream_permissions_reader for schema retrieval, properly aligning with the new responsibility distribution.
52-54
: Properly updated permissions retrieval.Now using the specialized permissions reader to get file ACL permissions, which aligns with the separation of concerns.
94-94
: Consistent schema retrieval update.The _get_raw_json_schema method now also retrieves the schema from the permissions reader, maintaining consistency across all schema-related methods.
unit_tests/sources/file_based/stream/test_file_identities_stream.py (6)
15-17
: Import for permissions reader looks good.The import is properly structured and matches the implementation changes.
64-64
: Good replacement of stream reader with permissions reader.This change correctly updates the test to use the new permissions reader mock instead of the generic stream reader.
67-67
: Correctly set schema on permissions reader.The test now sets the identity schema on the permissions reader, aligning with implementation changes.
71-71
: Updated constructor parameter correctly.FileIdentitiesStream constructor is now called with stream_permissions_reader instead of stream_reader, matching the implementation changes.
77-80
: Test correctly updated to use permissions reader.The test now calls load_identity_groups on the permissions reader mock instead of the stream reader.
92-94
: Exception handling properly updated.The test now raises exceptions from the permissions reader mock, maintaining test coverage for error cases.
airbyte_cdk/sources/file_based/file_based_source.py (5)
51-53
: Import for permissions reader looks good.The import is properly structured and aligns with the PR objective.
109-109
: Good storage of permissions reader in instance variable.Properly stores the permissions reader for later use in stream creation methods.
348-350
: Nice addition of docstring.Adding a clear docstring to _make_permissions_stream helps explain its purpose and improves code documentation.
361-361
: Properly updated stream creation.Now passing the permissions reader to the PermissionsFileBasedStream constructor, aligning with the implementation changes.
384-384
: Correctly updated identities stream creation.Now using stream_permissions_reader for FileIdentitiesStream, maintaining consistency across the codebase.
airbyte_cdk/sources/file_based/stream/identities_stream.py (2)
11-13
: Imports look good!
No issues here; everything seems in order. wdyt?
49-49
: Validate identities schema?
Would you like to confirm that this schema is aligned with what the rest of the connector expects by adding a quick integration check? wdyt?airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (1)
1-4
: License header looks proper.
Everything seems correct with the 2025 attribution. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/file_based/file_based_source.py (1)
386-389
: Consider reducing code duplicationThis validation check is almost identical to the one in
_make_permissions_stream
. Would it make sense to extract this common validation to a helper method to avoid duplication, perhaps called_ensure_permissions_reader_available()
? wdyt?- if not self.stream_permissions_reader: - raise ValueError( - "Stream permissions reader is required for streams that use permissions transfer mode." - ) + self._ensure_permissions_reader_available() + def _ensure_permissions_reader_available(self): + """ + Validates that a stream permissions reader is available. + Raises a ValueError if the reader is not provided. + """ + if not self.stream_permissions_reader: + raise ValueError( + "Stream permissions reader is required for streams that use permissions transfer mode." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/file_based_source.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (python)
🔇 Additional comments (6)
airbyte_cdk/sources/file_based/file_based_source.py (6)
51-53
: Clean import additionThe import of
AbstractFileBasedStreamPermissionsReader
is well placed with related imports. This aligns with the PR objective of introducing a new permissions reader class.
106-106
: Type annotation looks goodThe parameter is correctly typed as
Optional[AbstractFileBasedStreamPermissionsReader]
with a default value ofNone
. This resolves the previous linting issue that was flagged in a past review.
109-109
: Instance variable assignment is appropriateThe assignment to
self.stream_permissions_reader
follows the existing pattern in the class and is placed logically with other similar assignments.
348-354
: Good error handling with clear messageThe validation check and error message provide clear guidance when the permissions reader is missing. The docstring also clearly explains the method's purpose.
365-365
: Parameter update matches new architecturePassing
stream_permissions_reader
to thePermissionsFileBasedStream
constructor aligns with the refactoring goal of centralizing permission-checking logic.
392-392
: Parameter name updated correctlyThe parameter was changed from
stream_reader
tostream_permissions_reader
which matches the new design that uses a dedicated permissions reader class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (4)
34-35
: Docstring references incorrect class nameI noticed the docstring refers to
AbstractFileBasedStreamReader
but this class is namedAbstractFileBasedStreamPermissionsReader
. Should we update this to reference the correct class name? wdyt?- Therefore, concrete implementations of AbstractFileBasedStreamReader's config setter should assert that `value` is of the correct + Therefore, concrete implementations of AbstractFileBasedStreamPermissionsReader's config setter should assert that `value` is of the correct
64-64
: Return type mismatch in example docstringThe example method signature returns
Dict[str, Any]
but the actual method signature returnsIterable[Dict[str, Any]]
. Should we make these consistent? wdyt?- def load_identity_groups(self, logger: logging.Logger) -> Dict[str, Any]: + def load_identity_groups(self, logger: logging.Logger) -> Iterable[Dict[str, Any]]:
90-90
: Small typo in docstringsI noticed "patter" in the docstring which should probably be "pattern". Would you like to fix this? wdyt?
- # you can also follow the patter we have for python connectors and have a json file and read from there e.g. schemas/identities.json + # you can also follow the pattern we have for python connectors and have a json file and read from there e.g. schemas/identities.jsonAlso applies to: 116-116
125-126
: Trailing comma in schema definitionThere's a trailing comma inside the schema definition (line 125) which might cause issues in some JSON parsers. Would you like to remove it? wdyt?
"name": { "type": ["null", "string"] }, "email_address": { "type": ["null", "string"] }, "member_email_addresses": { "type": ["null", "array"] }, - "type": { "type": "string" }, + "type": { "type": "string" } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/file_based/file_based_source.py
(6 hunks)airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/file_based/file_based_source.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (1)
1-132
: Great job on implementing this new abstract class!The
AbstractFileBasedStreamPermissionsReader
class looks well-structured with clear abstract methods and detailed docstrings. The examples will be particularly helpful for developers implementing concrete subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (5)
1-4
: Check the copyright yearI noticed the copyright year is set to 2025. Would it make more sense to use the current year 2024 instead? Just checking if this was intentional or if we should adjust it, wdyt?
13-17
: The class docstring could be more descriptiveThe current docstring is quite brief. Perhaps we could enhance it to provide more context about how this class fits into the architecture and its relationship with other components in the system? Something like:
"This abstract class defines the interface for reading file permissions and identity information from a source. It's used by FileBasedSource to handle permission-related operations separately from stream reading functionality, allowing for better separation of concerns."
What do you think about expanding it this way?
34-36
: Fix typo in docstringThere's a small typo in the docstring - "AbstractFileBasedStreamPermissionsReader's's" has an extra "'s".
- Therefore, concrete implementations of AbstractFileBasedStreamPermissionsReader's's config setter should assert that `value` is of the correct + Therefore, concrete implementations of AbstractFileBasedStreamPermissionsReader's config setter should assert that `value` is of the correctWhat do you think?
64-77
: Inconsistency in example variablesI noticed a few inconsistencies in the example:
- Line 72 uses
groups.name
but it should likely begroup.name
to match the variable name- Line 72 references
user.email
but should likely begroup.email
- Line 68 refers to
self.google_directory_service
which seems like an implementation-specific reference rather than a generic exampleWould it make sense to standardize these variable names for clarity, wdyt?
- members_api = self.google_directory_service.members() + members_api = api_conn.members()- group_obj = my_identity_model(id=group.id, name=groups.name, email_address=user.email, type="group").dict() + group_obj = my_identity_model(id=group.id, name=group.name, email_address=group.email, type="group").dict()
115-128
: Remove trailing comma in JSON schemaThere's a trailing comma after the "type" property in the schema definition. While this is valid in Python literals, it's not valid in strict JSON.
"email_address": { "type": ["null", "string"] }, "member_email_addresses": { "type": ["null", "array"] }, - "type": { "type": "string" }, + "type": { "type": "string" }Would you like to remove it for consistency with standard JSON format?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (3)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (3)
45-53
: Good example implementationThe example implementation is clear and helpful for understanding how to implement this method. I like the use of concrete examples to guide implementers.
89-103
: Good schema exampleThe schema example is well-structured and provides a clear template for implementers to follow. The comments about alternative approaches (reading from a JSON file) are also helpful.
1-132
: Overall good abstract class designThe abstract class is well-designed with clear method signatures, detailed docstrings, and helpful examples. The use of abstract methods and properties with informative error messages ensures that implementers will understand what they need to provide.
The separation of concerns between file permissions and stream reading is a good architectural decision that should improve maintainability. Nice work on this implementation!
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (4)
34-34
: Typographical edit needed.
There's a minor double's's
in the docstring that may confuse readers. Would you like to remove one of the's
to clarify?-Therefore, concrete implementations of AbstractFileBasedStreamPermissionsReader's's config setter +Therefore, concrete implementations of AbstractFileBasedStreamPermissionsReader's config setter
39-55
: Clarify example references.
The sample snippet mentionssome_api
andSOME_CREDENTIALS
as placeholders. Would you like to explicitly mark these as pseudo code to avoid confusion, or remove them if they are purely illustrative? wdyt?
78-101
: Consider referencing an external JSON schema.
You mention that reading from a JSON file (likeschemas/file_permissions.json
) is a pattern followed in Python connectors. Would you like to adopt that approach here for maintainability? wdyt?
102-124
: Consistent approach for schemas.
Similarly, for the identities schema, do you plan to keep it inline or move it to a dedicated JSON file for consistency? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py (1)
13-16
: Nicely introduced abstract class.
This abstract class effectively sets the stage for file permission logic. Great approach!
What
Introduced an
AbstractFileBasedStreamPermissionsReader
class to handle permission-related tasks for file-based streams.This should fix to have to make noops like here ISP!!
I already have two PRs waiting for this change:
How
AbstractFileBasedStreamPermissionsReader
, to encapsulate permission-checking logic.Review guide
Here are the relevant files to review:
airbyte_cdk/sources/file_based/file_based_source.py
:airbyte_cdk/sources/file_based/file_based_stream_permissions_reader.py
:AbstractFileBasedStreamPermissionsReader
and its core permission-handling logic.airbyte_cdk/sources/file_based/file_based_stream_reader.py
:airbyte_cdk/sources/file_based/stream/identities_stream.py
:airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py
:These files reflect the core changes introduced in this pull request, focusing on the integration of the
AbstractFileBasedStreamPermissionsReader
across relevant file-based stream implementations.User Impact
Users should experience no changes in functionality. This refactoring is internal and does not alter any external interfaces or behaviors.
Can this PR be safely reverted and rolled back?
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Tests